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Abstract 

We survey models and algorithms for stream verification. 


1 Problem Definition 

Stream verification is concerned with the following setting. A computationally limited client wants to com¬ 
pute some property of a massive input, but lacks the resources to store even a small fraction of the input, 
and hence cannot perform the desired computation locally. The client therefore accesses a powerful but un¬ 
trusted service provider (e.g., a commercial cloud computing service), who not only performs the requested 
computation, but also proves that the answer is correct. An array of closely related models have been in¬ 
troduced to capture this scenario. The following section provides a unified presentation of these models, 
emphasizing their common features before delineating their differences. 

Stream Verification Model. Let a = (ai, 02 ,..., am) be a data stream, where each a* comes from a data 
universe U of size n, and let F be a function mapping data streams to a finite range 71. A stream verification 
protocol for F involves two parties: a prover V, and a (randomized) verifier V. The protocol consists of two 
stages: a stream observation stage and a proof verification stage. 

In the stream observation stage, V processes the stream a, subject to the standard constraints of the 
data-stream model, i.e., sequential access to a and limited memory. In the proof verification stage, V and F 
exchange a sequence of one or more messages, and afterward V outputs a value b. V is allowed to output a 
special symbol T indicating a rejection of V’s claims. Formally, V constitutes a stream verification protocol 
if the following two properties are satisfied. 

• Completeness: There is some prover strategy V such that, for all streams a, the probability that V 
outputs F{a) after interacting with V is at least 2/3. 

• Soundness: For all streams a and all prover strategies V, the probability that V outputs a value not in 
{F{x), _L} after interacting with V is at most e < 1/3. 

Here, the probabilities are taken over V’s internal randomness. The constants 2/3 and 1/3 are not 
essential and are chosen by convention. The parameter e is referred to as the soundness error of the protocol. 

Costs. There are five primary costs in any stream verification protocol: (1) V’s space usage, (2) the total 
communication cost, (3) V’s runtime, (4) F’s runtime, and (5) the number of messages exchanged. 
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Differences Between Models. There are three primary differences between the various models of stream 
verification that have been put forth in the literature. The first is whether the soundness condition is re¬ 
quired to hold against all cheating provers (such protocols are called information-theoretically or statistically 
sound), or only against cheating provers that run in polynomial time (such protocols are called computation¬ 
ally sound). The second is the amount and format of the interaction allowed between V and V. The third 
is the temporal relationship between the stream observation and proof verification stage — in particular, 
several models permit V and V to exchange messages before and during the stream observation stage, and 
sometimes permit the prover’s messages to depend on parts of the data stream that V has not yet seen. In 
general, more permissive models allow a larger class of problems to be solved efficiently, but may yield 
protocols that are less realistic. 

Summary of Models. The annotated data streaming (ADS) model, introduced by Chakrabarti et al. 131, is 
non-interactive: V is permitted to send a single message to V, with no communication allowed in the reverse 
direction. Technically, this model permits the contents of P’s message to be interleaved with the stream, in 
which case each bit of P’s message may be viewed as an “annotation” associated with a particular stream 
update. However, for most ADS protocols that have developed. P’s message does not need to be interleaved 
with the stream. Chakrabarti et al. ||3l distinguish between two kinds of ADS protocols: prescient protocols, 
in which the annotation sent at any given time can depend on parts of the data stream that V has not yet seen, 
and online protocols, which disallow this kind of dependence. 

Streaming interactive proofs (SIPs) extend the ADS model to allow the prover and verifier fo exchange 
many messages. This model was infroduced by Cormode ef al. lH. 

The Arthur-Merlin streaming model was infroduced by Gur and Raz ifT^ : fhis model is equivalenf fo 
a resfricfed class of SIPs, in which V is only allowed fo send a single message fo P (which musf consisf 
enfirely of random coin fosses, in analogy wifh fhe classical complexify class AM), before receiving P’s 
reply. 

The model of streaming delegation was infroduced by Chung ef al. ||5l, and corresponds fo SIPs fhaf 
only salisfy compufafional, rafher fhan informalion-lheorelic, soundness. 

2 Key Results 

Obfaining exacf answers even for basic problems in fhe sfandard dafa sfreaming model is impossible using 
o(n) space. In confrasf, sfream verificalion profocols wifh o(n) space and communicafion cosfs have been 
developed for (exacfly solving) a wide variefy of problems. Many of fhese profocols have adapfed powerful 
algebraic fechniques originally developed in fhe classical liferafure on inferacfive proofs, particularly fhe 
sum-check protocol of Lund ef al. ifTTl . All of fhe profocols described in Sections [2.1H2. 3 1 apply even fo 
sfreams in fhe sfricf furnsfile updafe model, where universe ifems can be delefed as well as inserted. The 
protocols in Section 12.41 do not support deletions. Unless stated otherwise, all protocols described in this 
survey are online, i.e., the honest prover’s message at any given time does not depend on parts of the data 
stream that V has not yet seen. 

2.1 Annotated Data Streams 

Chakrabarti et al. 121 showed that prescient ADS protocols can be exponentially more powerful than on¬ 
line ones for some problems. For example, there is a prescient ADS protocol with logarithmic space and 
communication costs for computing the median of a sequence of numbers: V sends V the claimed median 
T at the start of the stream, and while observing the stream, V checks that \{j ■. Oj < t}\ < m/2, and 
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\{j : aj > r}| < m/2, which can be done using an 0(logm)-bit counter. Meanwhile, Chakrabarti et 
al. 131 proved that any online protocol for Median with communication cost h and space cost v requires 
h ■ V = n(n), and gave an online ADS protocol achieving these communication-space tradeoffs up to 
logarithmic factors. 

Chakrabarti et al. ||3l also gave online ADS protocols achieving identical tradeoffs between space and 
communication costs for problems including Frequency Moments and Frequent Items, and used a 
lower bound due to Klauck |[T3l on the Merlin-Arthur communication complexity of the Set-Disjointness 
function to show that these tradeoffs are optimal for these problems, even among prescient protocols. Sub¬ 
sequent work 1711231 gave similarly optimal online ADS protocols for several more problems, including 
maximum matching and counting triangles in graphs, and matrix-vector multiplication. Chakrabarti et al. 
El gave optimized protocols for streams whose length m is much smaller than the universe size n. 

2.2 Streaming Interactive Proofs 

Cormode et al. El showed that several general protocols from the classical literature on interactive proofs 
can be simulated in the SIP model. In particular, this includes a powerful, general-purpose protocol due to 
Goldwasser, Kalai, and Rothblum lITOl (henceforth, the GKR protocol). Given any problem computed by 
an arithmetic or Boolean circuit of polynomial size and poly logarithmic depth, the GKR protocol requires 
only polylogarithmic space and communication while using polylogarithmic rounds of verifier-prover in¬ 
teraction. This yields SIPs for exactly solving many basic streaming problems with polylogarithmic space 
and communication costs, including Frequency Moments, Frequent Items, and Graph Connec¬ 
tivity. Cormode et al. El also gave optimized protocols for specific problems, including Frequency 
Moments (see the Detailed Example below). 

Gur and Raz ifTTl gave an Arthur-Merlin streaming protocol for the Distinct Elements problem 
with communication cost 0{h) space cost 0{v) for any h,v satisfying h ■ v > n, where the O notation 
hides factors that are polylogarithmic in n. Klauck and Prakash |[T5l extended this protocol to give an SIP 
for Distinct Elements with polylogarithmic space and communication costs and logarithmically many 
rounds of prover-verifier interaction. 

Chakrabarti et al. iH gave constant-round online SIPs with logarithmic space and communication costs 
for many problems, including Index, Range-Counting, and Nearest-Neighbor Search. These 
protocols are exponentially more efficient than what can be achieved by constant-round online SIPs in which 
V’s messages to the prover are independent of the input (such as is required in the Arthur-Merlin streaming 
model of ifTTIO . Eor classical interactive proofs where the verifier is not restricted to be streaming, allowing 
V’s messages to depend on the input does not yield analogous efficiency improvements: Goldwasser and 
Sipser ifTH showed that any interactive proof can be simulated with a polynomial blowup in all costs by an 
interactive proof in which V’s messages to V consist entirely of random coin tosses. 

2.3 Computationally Sound Protocols 

Computationally sound protocols may achieve properties that are unattainable in the information-theoretic 
setting. Eor example, they typically achieve reusability, allowing the verifier to use the same randomness 
to answer many queries. In contrast, most SIPs only support “one-shot” queries, because they require the 
verifier to reveal secret randomness to the prover over the course of the protocol. 

Chung et al. ||5l combined the GKR protocol with fully homomorphic encryption (EHE) to give reusable 
two-message streaming delegation protocols with polylogarithmic space and communication costs for any 
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problem in the complexity class NC. They also gave reusable four-message protocols with polylogarithimic 
space and communication costs for any problem in the complexity class P. 

Papamanthou et al. ifT^ gave improved streaming delegation protocols for a class of low-complexity 
queries including point queries and range search: these protocols avoid the use of FHE, and allow the prover 
to answer such queries in polylogarithmic time. (In contrast, protocols based on the GKR protocol HKH 
require the prover to spend time quasilinear in the size of the data stream after receiving a query, even if the 
answer itself can be computed in sublinear time.) The protocols of ifTSl are based on hash trees. 

2.4 Models Allowing Linear Total Communication 

The literature on stream verification contains two models that depart from those described above in that they 
allow V and V to communicate on each stream update. In particular, they allow a linear amount of total 
communication, but aim to bound the maximum amount of communication exchanged (and processing time 
spent) during any stream update. 

A Statistically Sound Model. Klauck and Prakash ifldll studied a variant of online SIPs, in which V and 

V are allowed to exchange a constant number of messages during each stream update, as long as at most a 
constant number of machine words are exchanged during each update. They gave protocols in this model for 
Median, for determining if a matrix is full-rank, and for approximating the longest-increasing subsequence 
of the stream. 

A Computationally Sound Model. Schroder and Schroder 1201 introduced a model requiring only compu¬ 
tational soundness, which they called the verifiable data streaming (VDS) model. This model is targeted at 
settings in which there are many parties who may wish to verify answers returned by the prover. In the VDS 
model, V is allowed to send a short message to V on each stream update, with no communication allowed 
in the reverse direction. VDS protocols are required to be reusable and publicly verifiable, in the sense that 
any party who knows V’s public key can verify any response by the prover. The VDS model does not permit 

V to update the public key on every stream update, as this would require coordination between all parties 
wishing to verify information returned by the prover. 

There is a trivial VDS protocol for the Index problem, requiring polylogarithmic communication on 
each stream update. Recall that in the Index problem, the data stream specifies all entries of a vector 
X = (xi,..., Xn) G {0,1}"', followed by an index i G [n], and the goal to output Xj. In the trivial VDS 
protocol, for every stream update xj, V sends to "P a digital signature of the tuple {j, xj). After the stream 
observation phase, V sends to V the claimed value of Xj along with a valid digital signature of the tuple 
{i,Xi). The protocol is secure assuming the prover cannot forge valid signatures: under this assumption, 
the only way V can efficiently produce a valid signature of the tuple (i, Xj) is if V had previously sent the 
signature to V. 

Schroder and Schroder ll20l gave a VDS protocol that supports Index queries in a more general setting 
in which the stream not only contains entries of the vector x, but also contains UPDATE operations. Here, 
an Update operation changes the value of a designated entry Xj of x to a new value x'. However, in the 
protocol of ll20l . each UPDATE operation requires bidirectional communication (one message from V to V, 
followed by one from V to V), as well as an update to V’s public key. 

Krupp et al. ifT^ gave related VDS protocols that reduces costs by logarithmic factors. 
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Input: V is given oracle access to a i;-variate polynomial g over finite field F and an 77 € F. 

Goal: Determine whether H = i ^v)- 

• In the first round, V computes the univariate polynomial 

gi{Xi) ■= ^ g{Xi,X 2 ,...,Xy), 

and sends gi to V. V checks that gi is a univariate polynomial of degree at most deg]^(p), and that 
H = pi(0) + gi(l), rejecting if not. 

• V chooses a random element ri G F, and sends ri to V. 

• In the jth round, for l<j<v, V sends to V the univariate polynomial 

g{ri,... ,rj-i,Xj,Xj+i,... ,Xy). 

(xj + i....,x„)G{0,l}”-3 

V checks that gj is a univariate polynomial of degree at most degj ((;), and that = 

5j (0) + rejecting if not. 

• V chooses a random element rj G F, and sends rj to V. 

• In round v, V sends to V the univariate polynomial 

Qv i^Xy) ^ Xy — l^ Xy^. 

V checks that is a univariate polynomial of degree at most deg„(p), rejecting if not. 

• V chooses a random element r„ G F and evaluates g{ri ,..., r„) with a single oracle query to g. V 
checks that gy(ry) = p(ri,..., r„), rejecting if not. 

• If V has not yet rejected, V halts and accepts. 

Figure 1: Description of the sum-check protocol. degj(( 7 ) denotes the degree of g in the ith variable. 

2.5 Implementations 

Implementations of the GKR protocol were provided by Cormode et al. in 161 and Thaler in fT2^ . Cormode 
et al. ||3 also provided optimized implementations of several ADS protocols from 131171. Thaler et al. l2^ 
provided parallelized implementations using Graphics Processing Units. 

Qian IT^ refined and implemented the streaming delegation protocol for point queries and range search 
of Papamanthou et al. HU. 

Implementations of VDS protocols are described by Schroder and Si mk in ITTI and Ki'upp et al. IT^ . 


3 Detailed Example 

As described in HI, the sum-check protocol can be directly applied to give an SIP for the fcth frequency 
moment problem with logn rounds of prover-verifier iteration, and 0(log^(n)) space and communication 
costs. The sum-check protocol is described in Figure [H 

Properties and Costs of the Sum-check Protocol. The sum-check protocol satisfies perfect completeness. 
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and has soundness error e < v ■ deg( 5 ()/|F|, where deg( 5 ) := degi{g) denotes maximum degree of 

g in any variable (see ifTTl for a proof). There is one round of prover-verifier interaction in the sum-check 
protocol for each of the v variables of g, and the total communication 0(v ■ deg(g)) field elements. 

Note that as described in Figured] the sum-check protocol assumes that the verifier has oracle access to 
g. However, this will not be the case in applications, as g will ultimately be a polynomial that depends on 
the input data stream. 

The SIP for Frequency Moments. In the kth frequency moments problem, the goal is to output YlieU 
where fi is the number of times item i appears in the data stream a. For a vector i = (fi,..., fiogn) G 

{0, let Xi{xu ■ ■ ■ ,a;iogn) = rifcir Xikixk), where xo{xk) = I - Xk and Xi{xk) = Xk- Xi is the 

unique multilinear polynomial that maps i G { 0 ,to 1 and all other values in { 0 ,to 0 , and it is 
referred to as the multilinear extension of i. 

For each i ^U, associate i with a vector i € {0, in the natural way, and let F be a finite field wifh 

< |F| < 4 • n^. Define fhe polynomial /: F'°s” —>■ F via 

/ = X] /* • Xi- (1) 

Note fhaf / is fhe unique mulfilinear polynomial satisfying fhe properly lhal /(i) = fi for all i G { 0 ,l}'°g". 

The kth frequency momenf of a is equal fo fi = X^iG{o i}i°s'i(/^)(i)- Hence, in order 

fo compufe fhe A:fh frequency momenf of a, if suffices fo apply fhe sum-check protocol fo fhe polynomial 
g — fk_ This requires log n rounds of prover-verifier interacfion, and since fhe fofal degree of is k- log n, 
fhe fofal communication cost is 0{k log n) field elemenfs. which require 0{k^ log^ n) fofal bifs to specify. 

Af fhe end of fhe sum-check protocol, V must compute g{ri ,..., riog„) = (/^)(ri,..., Hogn) for 
randomly chosen (ri,..., riogn) £ It suffices for V to evaluate 2 ; ;= /(ri,..., riogn), since 

■ ■ ■, Hogn) = The following lemma establishes that V can evaluate z with a single pass over a, 
while storing O(logn) field elemenfs. 

Lemma 1. V can compute z = f{ri,, riogn) with a single streaming pass over a, while storing 0(log n) 
field elements. 

Proof. Given any slream update aj G U, lei aj G {0,1}^°®” denote fhe binary vector associated wifh 
Oj. If follows from Equation ([T]l thal /(ri,..., riogn) = Xaj {xi, ■ ■ ■, Hogn)- Thus, V can compute 
/(ri,... , riogn) incremenfally from fhe raw sfream by initializing /(ri,..., riogn) 0 ^ and processing 
each update aj via: 

/(^1) • • • ) Hog n) ^ /(^1) • • • ) Dog n) T Xaj (^Ij • • • j Dogn)- 

V only needs fo store (ri,..., Hogn) and /(ri,..., riogn)> which is 0 (log n) field elemenfs in fofal. □ 


4 Open Problems 

• Two-message online SIP prolocols wifh logarilhmic space and communicalion cosls are known for 
several functions, including fhe Index funcfion (cf. 0). If is also known fhaf existing techniques 
cannol yield 2- or 3-message online SIPs of polylogarilhmic cosf for fhe Median or Frequency 
Moments problems. However, fhe following is open: exhibil an explicil function F : {0,1}” — 
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{0,1} that cannot be computed by any online two-message SIP with communication and space costs 
both bounded above by h, for some h = w(logn). See ||4l for details. Candidate functions satisfying 
this property include Set-Disjointness and Inner-Product-Mod-2. 

• For several functions F\ {0,1}” —>■ {0,1}, it is known that any online ADS protocol for F with 
communication cost h and space cost v requires h ■ v = D(n). This lower bound is tight in many 
cases, such as for the Index function (cf. (31). However, the following is open: exhibit an explicit 
function that cannot be computed by any online ADS protocol with communication and space costs 
both bounded above by h, for some h = 

• Chakrabarti et al. (21 and Thaler (2^ have also identified explicit problems which are just as hai'd for 
online ADS protocols as they are in the standard data streaming model, in the sense that the sum of 
V’s space cost and the communication cost in any online ADS protocol must be at least as large as 
the space complexity of a standard streaming algorithm for the problem, up to constant or logarithmic 
factors. However, it is open to exhibit an explicit function F that is just as hard in for prescient ADS 
protocols as it is in the standard streaming model. 

Formally, the following is open: exhibit an explicit function F such that, for any constant S > 0, F 
cannot be computed by any prescient ADS protocol with communication and space cost both bounded 
above by 0(stream(F)^“‘^). Here, stream(F) denotes the minimum space complexity of any stream¬ 
ing algorithm that, for any input stream a, outputs F{a) with probability at least 2/3. Candidate 
functions include Graph Connectivity and Graph Bipartiteness. 

For all three problems, functions satisfying the relevant properties are known to exist by counting argu¬ 
ments. But as stated above, identifying explicit functions satisfying the properties remains open. 
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